Style Breach Detection: An Unsupervised Detection Model

نویسنده

  • Jamal Ahmad Khan
چکیده

This paper deals with the sub-task of PAN 2017 Author Identification, which is to detect style breaches for unknown number of authors within a single document in English. The presented model is an unsupervised approach that will detect style breaches and mark text boundaries on the basis of different stylistic features. This model will use some classical stylistic features like POS analysis and sentence lexical analysis. Also some new features naming common English word frequencies within sentence text, sentence expression and sentence attitude have been proposed. The new features may not be directly linked to author’s style of writing but to the subject/topic of sentence under analysis. Moreover the model uses sentence window for style detection. The sentence window may be extended to neighboring sentences during its unsupervised analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Overview of the Author Identification Task at PAN-2017: Style Breach Detection and Author Clustering

Several authorship analysis tasks require the decomposition of a multiauthored text into its authorial components. In this regard two basic prerequisites need to be addressed: (1) style breach detection, i.e., the segmenting of a text into stylistically homogeneous parts, and (2) author clustering, i.e., the grouping of paragraph-length texts by authorship. In the current edition of PAN we focu...

متن کامل

Alert correlation and prediction using data mining and HMM

Intrusion Detection Systems (IDSs) are security tools widely used in computer networks. While they seem to be promising technologies, they pose some serious drawbacks: When utilized in large and high traffic networks, IDSs generate high volumes of low-level alerts which are hardly manageable. Accordingly, there emerged a recent track of security research, focused on alert correlation, which ext...

متن کامل

Style Breach Detection with Neural Sentence Embeddings

The paper investigates method for the style breach detection task. We developed a method based on mapping sentences into high dimensional vector space. Each sentence vector depends on the previous and next sentence vectors. As main architecture for this mapping we use the pre-trained encoder-decoder model. Then we use these vectors for constructing an author style function and detecting outlier...

متن کامل

Unsupervised Sequential Information Bottleneck Clustering For Building Anomaly Based Network Intrusion Detection Model

In this paper we present a novel approach to unsupervised clustering in building an efficient anomaly based network intrusion detection model. The method is based on a recently introduced sequential information bottleneck (sIB) principle. KDDCup 1999 intrusion detection benchmark dataset is used for the experimentation of our proposed technique. The experimental results demonstrate that the pro...

متن کامل

Fast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies

Collecting insurance fraud samples is costly and if performed manually is very time consuming. This issue suggests usage of unsupervised models. One of the accurate methods in this regards is Spectral Ranking of Anomalies (SRA) that is shown to work better than other methods for auto insurance fraud detection specifically. However, this approach is not scalable to large samples and is not appro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017